Agenda

Procesamiento de Texto

Ver archivo Python

Untitled2
In [1]:
import pandas as pd
import urllib
import numpy as np
import urllib.request
import re
from textblob import TextBlob
%run lib.py
In [2]:
#name="Legally%20Blonde"
#name="aboutmary"
#name="10Things"
name="magnolia"
#name="Friday%20The%2013th"
#name="Ghost%20Ship"
#name="Juno"
#name="Reservoir+Dogs"
#name="shawshank"
#name="Sixth%20Sense,%20The"
#name="sunset_bld_3_21_49"
#name="Titanic"
#name="toy_story"
#name="trainspotting"
#name="transformers"
#name="the-truman-show_shooting"
#name="batman_production"
In [3]:
ext="html"
txtfiles=["Ghost%20Ship", "Legally%20Blonde", "Friday%20The%2013th", "Juno", "Reservoir+Dogs", "Sixth%20Sense,%20The", "Titanic"]
if name in txtfiles:
    ext="txt"
fp = urllib.request.urlopen("http://www.dailyscript.com/scripts/"+name+"."+ext)
mybytes = fp.read()

mystr = mybytes.decode("utf8", "ignore")
fp.close()
liston=mystr.split("\n")
liston=[s.replace('\r', '') for s in liston]
liston=[re.sub('<[^<]+?>', '', text) for text in liston]
In [4]:
if name=="shawshank":
    liston=[i.replace("\t", "    ") for i in liston]
In [5]:
char=""
script=[]
charintro='                                 '
endofdialogue='          '
dialoguepre='                    '
newscenepre='          '
charintro=''
endofdialogue=''
dialoguepre=''
newscenepre=''
i=45
print("Characters")
i, charintro=nextbigchunk(liston, i)
print("Adverbs")
i, adverb=nextbigchunk(liston, i, adverbs=True)
print("Dialogues")
i, dialoguepre=nextbigchunk(liston, i)
print("New Scene:")
i, newscenepre=nextbigchunk(liston, i)

if newscenepre=="X":
    i=100
    i, newscenepre=nextbigchunk(liston, i)
    if name=="aboutmary":
        newscenepre=" ".join(["" for i in range(56)])
    if len(newscenepre)==len(charintro):
        newscenepre="X"
    

endofdialogue=newscenepre
    

scene=1
for s in liston:
    if s[0:len(charintro)]==charintro and s[len(charintro)]!=" " and s.strip()[0]!="(" and s.strip()[len(s.strip())-1]!=")":
        #print("Charatcer*****")
        char=s[len(charintro):]
        new=dict()
        new['char']=char.strip()
        new['dialogue']=""
        new['scene']=scene
        new['adverb']=""
    if s==endofdialogue or s.replace(" ", "")=="":
        if char!="":
            char=""
            script.append(new)
    if char!="" and s[0:len(dialoguepre)]==dialoguepre and s[len(dialoguepre)]!=" ":
        #print("Dialogue******")
        if new['dialogue']!="":
            new['dialogue']=new['dialogue']+" "
        new['dialogue']=new['dialogue']+s[len(dialoguepre):]
    if char!="" and ((s[0:len(adverb)]==adverb and s[len(adverb)]!=" ") or (len(s)>1 and s.strip()[0]=="(" and s.strip()[len(s.strip())-1]==")" )):
        if new['adverb']!="":
            new['adverb']=new['adverb']+" "
        new['adverb']=new['adverb']+s[len(adverb):]
    if s[0:len(newscenepre)]==newscenepre and len(s)>len(newscenepre) and ( s.isupper()) and s[len(newscenepre)]!=" ":
        scene=scene+1
Characters
                                magnolia
                                NARRATOR
                                NARRATOR
                                NARRATOR
                                NARRATOR
                                NARRATOR
Adverbs
Dialogues
                      In the New York Herald, November 26,
                      year 1911, there is an account of the
                      hanging of three men --
                      ...they died for the murder of
                      Sir Edmund William Godfrey --
                      -- Husband, Father, Pharmacist and all
New Scene:
     a P.T. Anderson picture                             11/10/98
     a Joanne Sellar/Ghoulardi Film Company production
     
     
     
     
In [6]:
pd.DataFrame(script).to_csv(name+'.csv', index=None)
pd.DataFrame(script)
Out[6]:
adverb char dialogue scene
0 magnolia 1
1 NARRATOR In the New York Herald, November 26, year 1911... 2
2 NARRATOR ...they died for the murder of Sir Edmund Will... 2
3 NARRATOR -- Husband, Father, Pharmacist and all around ... 2
4 NARRATOR Greenberry Hill, London. Population as listed. 3
5 NARRATOR He was murdered by three vagrants whose motive... 5
6 NARRATOR ...Joseph Green..... 5
7 NARRATOR ...Stanley Berry.... 5
8 NARRATOR ...and Nigel Hill... 5
9 NARRATOR Green, Berry and Hill. 7
10 NARRATOR ...And I Would Like To Think This Was Only A M... 7
11 NARRATOR As reported in the Reno Gazzette, June of 1983... 9
12 NARRATOR --- the water that it took to contain the fire -- 10
13 NARRATOR -- and a scuba diver named Delmer Darion. 12
14 NARRATOR Employee of the Peppermill Hotel and Casino, R... 15
15 NARRATOR -- well liked and well regarded as a physical,... 16
16 NARRATOR -- as reported by the coroner, Delmer died of ... 21
17 NARRATOR ...volunteer firefighter, estranged father of ... 24
18 NARRATOR -- added to this, Mr. Hansen's tortured life m... 26
19 CRAIG HANSEN ...oh God...fuck...I'm sorry...I'm sorry... 27
20 NARRATOR The weight of the guilt and the measure of coi... 27
21 CRAIG HANSEN ...forgive me... 27
22 NARRATOR And I Am Trying To Think This Was All Only A M... 29
23 NARRATOR The tale told at a 1961 awards dinner for the ... 32
24 NARRATOR Seventeen year old Sydney Barringer. In the ci... 33
25 NARRATOR The coroner ruled that the unsuccessful suicid... 33
26 NARRATOR The suicide was confirmed by a note, left in t... 34
27 NARRATOR At the same time young Sydney stood on the le... 35
28 NARRATOR The neighbors heard, as they usually did, the... 36
29 NARRATOR -- and it was not uncommon for them to threat... 37
... ... ... ... ...
1493 DIXON We gotta get his money so we can get outta her... 382
1494 WORM That idea is over now. We're not gonna do tha... 382
1495 (to Stanley) DIXON DADDY, FUCK, DADDY, DON'T GET MAD AT ME. DON'T... 382
1496 WORM I'm not mad, son, I will not be mad at you an... 382
1497 DIXON DAD. 382
1498 DIXON I - just - thought - that - I - didn't want - ... 382
1499 WORM It's ok, boy. 382
1500 MUSIC/KERMIT THE FROG "It's not that easy bein' green... Having to s... 383
1501 DONNIE My teeff...my teeef.... 385
1502 JIM KURRING YOU'RE OK...you're gonna be ok.... 385
1503 NARRATOR And there is the account of the hanging of thr... 390
1504 NARRATOR There are stories of coincidence and chance an... 391
1505 NARRATOR ...and we generally say, "Well if that was in... 392
1506 DOCTOR Are you with us? Linda? Is it Linda? 394
1507 NARRATOR Someone's so and so meet someone else's so and... 395
1508 NARRATOR And it is in the humble opinion of this narrat... 398
1509 STANLEY Dad...Dad. 399
1510 STANLEY You have to be nicer to me, Dad. 399
1511 RICK Go to bed. 399
1512 STANLEY I think that you have to be nicer to me. 399
1513 RICK Go to bed. 399
1514 NARRATOR ...and so it goes and so it goes and the book... 400
1515 MARCIE I killed him. I killed my husband. He hit my... 401
1516 DONNIE I know that I did a thtupid thing. Tho-thtupid... 402
1517 DONNIE I really do hath love to give, I juth don't kn... 402
1518 JIM KURRING ...these security systems can be a real joke. ... 403
1519 DONNIE ....ohh-thur-I-thur-thill.... 403
1520 JIM KURRING You guys make alotta money, huh? 403
1521 (beat) JIM KURRING ...alot of people think this is just a job tha... 405
1522 END. 406

1523 rows × 4 columns

In [7]:
magnolia=pd.read_csv(name+'.csv')
stopwords = getstopwords()
In [8]:
removedchars=["'S VOICE", "'S WHISPER VOICE", " GATOR"]
for s in removedchars:
    magnolia['char']=magnolia['char'].apply(lambda x: x.replace(s, ""))
i=0
scenes=dict()
for s in magnolia.iterrows():
    scenes[s[1]['scene']]=[]
for s in magnolia.iterrows():
    scenes[s[1]['scene']].append(s[1]['char'])
for s in magnolia.iterrows():
    scenes[s[1]['scene']]=list(set(scenes[s[1]['scene']]))
In [9]:
characters=[]
for s in scenes:
    for k in scenes[s]:
        characters.append(k)
characters=list(set(characters))
appearances=dict()
for s in characters:
    appearances[s]=0
for s in magnolia.iterrows():
    appearances[s[1]['char']]=appearances[s[1]['char']]+1
In [10]:
a=pd.DataFrame(appearances, index=[i for i in range(len(appearances))])
In [11]:
finalcharacters=[]
for s in pd.DataFrame(a.transpose()[0].sort_values(0, ascending=False))[0:10].iterrows():
    finalcharacters.append(s[0])
In [12]:
finalcharacters
file=open(name+"_nodes.csv", "w")
couplesappearances=dict()
for s in finalcharacters:
    file.write(";")
    file.write(s)
file.write("\n")
for s in finalcharacters:
    newlist=[]
    for f in finalcharacters:
        newlist.append(0)
        couplesappearances[f+"_"+s]=0
    j=0
    for f in finalcharacters:
        for p in scenes:
            if f in scenes[p] and s in scenes[p] and f!=s and finalcharacters.index(f)<finalcharacters.index(s): 
                long=len(magnolia[magnolia["scene"]==p])
                newlist[j]=newlist[j]+long
                couplesappearances[f+"_"+s]=couplesappearances[f+"_"+s]+long
        j=j+1
    file.write(s)
    for f in newlist:
        file.write(";")
        file.write(str(f))
    file.write("\n")
file.close()
In [13]:
a=pd.DataFrame(couplesappearances, index=[i for i in range(len(couplesappearances))])
finalcouples=[]
for s in pd.DataFrame(a.transpose()[0].sort_values(0, ascending=False))[0:4].iterrows():
    finalcouples.append(s[0])
In [14]:
file=open(name+"_finalcharacters.csv", "w")
for s in finalcharacters:
    file.write(s+"\n")
file.close()
file=open(name+"_finalcouples.csv", "w")
for s in finalcouples:
    file.write(s+"\n")
file.close()
In [15]:
importantchars=[]
for char in appearances:
    if appearances[char]>10:
        importantchars.append(char)
In [16]:
file=open(name+"_sentiment_overtime_individual.csv", "w")
file2=open(name+"_sentiment_overtime_individualminsmaxs.csv", "w")

for k in finalcharacters:
    print(k)
    dd=getdialogue(magnolia, k, k, scenes)
    dd=[str(d) for d in dd]
    polarities, subjectivities=getsentiment(dd)
    %matplotlib inline
    import matplotlib.pyplot as plt
    moveda=maverage(polarities, dd, .99)
    plt.plot(moveda)
    i=0
    for s in moveda:
        file.write(k+","+str(float(i)/len(moveda))+", "+str(s)+"\n")
        i=i+1
    plt.ylabel('polarities')
    plt.show()
    file2.write(k+"| MIN| "+dd[moveda.index(np.min(moveda))]+"\n")
    file2.write(k+"| MAX| "+dd[moveda.index(np.max(moveda))]+"\n")
    print("MIN: "+dd[moveda.index(np.min(moveda))])
    print("\n")
    print("MAX: "+dd[moveda.index(np.max(moveda))])
    
file.close()
file2.close()

file=open(name+"_sentiment_overtime_couples.csv", "w")
file2=open(name+"_sentiment_overtime_couplesminsmaxs.csv", "w")

for k in finalcouples:
    print(k)
    liston=k.split("_")
    dd=getdialogue(magnolia, liston[0], liston[1], scenes)
    dd=[str(d) for d in dd]
    polarities, subjectivities=getsentiment(dd)
    %matplotlib inline
    import matplotlib.pyplot as plt
    moveda=maverage(polarities, dd, .99)
    plt.plot(moveda)
    i=0
    for s in moveda:
        file.write(k+","+str(float(i)/len(moveda))+", "+str(s)+"\n")
        i=i+1
    plt.ylabel('polarities')
    plt.show()
    file2.write(k+"| MIN| "+dd[moveda.index(np.min(moveda))]+"\n")
    file2.write(k+"| MAX| "+dd[moveda.index(np.max(moveda))]+"\n")
    print("MIN: "+dd[moveda.index(np.min(moveda))])
    print("\n")
    print("MAX: "+dd[moveda.index(np.max(moveda))])
    
file.close()
file2.close()
JIM KURRING
MIN: You mind if I check things back here? 


MAX: YOU'RE OK...you're gonna be ok....
JIMMY
MIN: She went crazy.  She went crazy, Rose. 


MAX: Imagine you are attending a jam session of classical composers and they have  each done an arrangment of the classic  favorite, "Whispering."  Here are three  variations on the theme, as three classic  composer's might have written it -- you are to name the composer.  The First: 
CLAUDIA
MIN: I'm sorry. 


MAX: Did you ever go out with someone and just....lie....question after question, maybe you're trying to  make yourself look cool or better  than you are or whatever, or smarter  or cooler and you just -- not really lie, but maybe you just don't say everything --
FRANK
MIN: If you feel, made to feel like you need them, like -- like you can't live if you're without them or you need, what?  They're pussy?  They're love? Fuck that.  Self Sufficient, gents.  That's the truth. What you are -- we are -- you need them  for what?  To fucking make you a piece of snot rag?  A puppett?  huh?  Hear them bitch and moan? bitch and moan --  and we're taught one thing -- go the other way -- there is No Excuse I will give you, I'm not gonna apologize -- I'm not gonna  apologize for my NEED my DESIRE...my, the  things that I need as a man to feel comfortable... You understand?  You understand?  You need to say something, "my mommy hit me or  daddy hit me or didn't let me play soccer,  so now I make mistakes, cause a that -- something, so now I piss and shit on it and do this." Bullshit.  I'm sorry. ok. yeah. no. fuck.  go.  fuck. alright. go make a new mistake. maybe not, I dunno...fuck.... 


MAX: I wouldn't want that to be misunderstood: My enrollment was totally unoffical because I was, sadly, unable to afford tuition up  there.  But there were three wonderful men who were kind enough to let me sit in on their classes, and they're names are:  Macready, Horn and Langtree among others. I was completely independent financially, and like I said: One Sad Sack A Shit.  So what we're looking at here is a true rags to riches story and I think that's  what most people respond to in "Seduce," And At The End Of The Day? Hey -- it may not  even be about picking up chicks and sticking your cock in it -- it's about finding What You Can Be In This World.  Defining It.  Controling It and  saying: I will take what is mine.  You just happen  to get a blow job out of it, then hey-what-the-fuck- why-not?  he.he.he.
PHIL
MIN: You wanna call him on the phone? We can call him, I can dial the  phone if you can remember the number -- 


MAX: Thank you, Chad, and good luck to you and your mother -- 
STANLEY
MIN: I think that you have to be nicer to me.


MAX: I'm fine. I'm fine, I just wanna keep playing --
DONNIE
MIN: My teeff...my teeef....


MAX: My name is Donnie Smith and I have lot's of love to give. 
EARL
MIN: No, no, the grade...the grade that you're in? 


MAX: "...it's not going to stop 'till you wise up..."
LINDA
MIN: listen...listen to me now, Phil:  I'm sorry, sorry I slapped your face.  ...because I don't know what I'm doing... ...I don't know how to do this, y'know?  You understand?  y'know?  I...I'm...I do things  and I fuck up and I fucked up....forgive me, ok? Can you just...


MAX: I'm listening.  I'm getting better. 
NARRATOR
MIN: -- added to this, Mr. Hansen's tortured life met before with Delmer Darion just two nights previous --


MAX: So Fay Barringer was charged with the  murder of her son and Sydney Barringer  noted as an accomplice in his own death...
JIM KURRING_CLAUDIA
MIN: You mind if I check things back here? 


MAX: ok. 
JIMMY_STANLEY
MIN: I don't mean to cry, I'm sorry. 


MAX: Imagine you are attending a jam session of classical composers and they have  each done an arrangment of the classic  favorite, "Whispering."  Here are three  variations on the theme, as three classic  composer's might have written it -- you are to name the composer.  The First: 
PHIL_EARL
MIN: -- it's not him. it's not him. He's the fuckin' asshole...Phil..c'mere... 


MAX: ...ah...maybe...yeah...she's a good one... 
FRANK_PHIL
MIN: When they put me on hold, to  talk to you...they play the tapes.  I mean: I'd seen the commercials and heard about you, but I'd never heard the tapes ....


MAX: I just...he was...but I gave him,  I just had to give him a small dose of  liquid morphine.  He hasn't been able to swallow the morphine pills so we now,  I just had to go to the liquid morphine... For the pain, you understand? 
In [17]:
for key, val in scenes.items():
    for s in scenes[key]:
        new="INSCENE_"+scenes[key][0]
        scenes[key].remove(scenes[key][0])
        scenes[key].append(new)
In [18]:
magnolia.dropna(subset=['dialogue'])
1
Out[18]:
1
In [19]:
baskets=[]
spchars=["\"", "'", ".", ",", "-"]
attributes=["?", "!"]
for s in magnolia.iterrows():
    if type(s[1]['dialogue'])!=float and  len(s[1]['dialogue'])>0:
        new=[]
        for k in scenes[s[1]['scene']]:
            new.append(k)
        new.append("SPEAKING_"+s[1]['char'])
        for k in s[1]['dialogue'].split(" "):
            ko=k
            for t in spchars:
                ko=ko.replace(t, "")
            for t in attributes:
                if ko.find(t)>=0:
                    new.append(t)
                    ko=ko.replace(t, "")
            if len(ko)>0:
                new.append(ko.lower())
        new=list(set(new))
        baskets.append(new)
In [20]:
baskets2=[]
basketslist=[]
for k in baskets:
    new=dict()
    new2=[]
    for t in k:
        if t not in stopwords:
            new[t]=1
            new2.append(t)
    baskets2.append(new)
    basketslist.append(new2)
In [21]:
baskets2=pd.DataFrame(baskets2)
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules
baskets2=baskets2.fillna(0)
baskets2.to_csv(name+'_basket.csv')
In [22]:
frequent_itemsets = apriori(baskets2, min_support=5/len(baskets2), use_colnames=True)
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)
In [23]:
rules['one_lower']=[int(alllower(i) or alllower(j)) for i, j in zip(rules['antecedants'], rules['consequents'])]
In [24]:
rules['both_lower']=[int(alllower(i) and alllower(j)) for i, j in zip(rules['antecedants'], rules['consequents'])]
In [25]:
rules.to_csv(name+'_rules.csv', index=None)

Analisis de Sentimiento (Pelicula & Personaje)

Score por Pelicula

Titulo
.
LEGALLY BLONDE
Numero de Palabras/Tokens en el texto original
Palabras Distintas
1900
Escala de Sentimientos entre negativos y positivos: afinn
Descripcion Score % Founded Words
Entre 0 (negativo) y 10 (positivo) 5.332362 12.8%
Porcentaje de Palabras encontradas por tipo de sentimiento (bing) 14.2%
sentiment Porcentaje
positive 61.7%
negative 38.3%
Porcentaje de Palabras encontradas por tipo de sentimiento (nrc) 22.1%
sentiment Porcentaje
positive 19.7%
trust 15.7%
negative 13.5%
anticipation 10.6%
joy 9.4%
fear 8.3%
sadness 7.0%
anger 6.7%
surprise 4.8%
disgust 4.4%
Porcentaje de Palabras encontradas por tipo de sentimiento (loughran) 7.63%
sentiment Porcentaje
negative 33.3%
litigious 24.0%
positive 22.6%
uncertainty 18.5%
constraining 1.7%

Score por Personaje

[1] “Analisis de Sentimientos del Personaje: ELLE” [1] “Numero total de Palabras Unicas en el texto: 1095”

Escala de Sentimientos entre negativos y positivos: afinn
Descripcion Score % Founded Words
Entre 0 (negativo) y 10 (positivo) 5.60473 12.7%
Porcentaje de Palabras encontradas por tipo de sentimiento ( bing ) 12.6%
sentiment Porcentaje
positive 64.3%
negative 35.7%
Porcentaje de Palabras encontradas por tipo de sentimiento ( nrc ) 21.8%
sentiment Porcentaje
positive 21.2%
trust 17.7%
negative 12.1%
anticipation 11.3%
joy 11.0%
fear 7.4%
sadness 5.9%
anger 5.8%
surprise 4.4%
disgust 3.3%
Porcentaje de Palabras encontradas por tipo de sentimiento ( loughran ) 7.76%
sentiment Porcentaje
negative 35.1%
positive 26.5%
litigious 23.2%
uncertainty 14.1%
constraining 1.1%

[1] “Analisis de Sentimientos del Personaje: EMMETT” [1] “Numero total de Palabras Unicas en el texto: 295”

Escala de Sentimientos entre negativos y positivos: afinn
Descripcion Score % Founded Words
Entre 0 (negativo) y 10 (positivo) 4.923077 9.83%
Porcentaje de Palabras encontradas por tipo de sentimiento ( bing ) 12.2%
sentiment Porcentaje
positive 58.1%
negative 41.9%
Porcentaje de Palabras encontradas por tipo de sentimiento ( nrc ) 15.9%
sentiment Porcentaje
positive 18.2%
negative 15.3%
anticipation 13.1%
trust 13.1%
fear 9.5%
joy 8.0%
sadness 8.0%
anger 5.8%
disgust 5.1%
surprise 3.6%
Porcentaje de Palabras encontradas por tipo de sentimiento ( loughran ) 5.76%
sentiment Porcentaje
uncertainty 37.5%
negative 29.2%
positive 29.2%
litigious 4.2%

[1] “Analisis de Sentimientos del Personaje: DONOVAN” [1] “Numero total de Palabras Unicas en el texto: 276”

Escala de Sentimientos entre negativos y positivos: afinn
Descripcion Score % Founded Words
Entre 0 (negativo) y 10 (positivo) 4.954545 11.6%
Porcentaje de Palabras encontradas por tipo de sentimiento ( bing ) 13%
sentiment Porcentaje
positive 53.33%
negative 46.67%
Porcentaje de Palabras encontradas por tipo de sentimiento ( nrc ) 18.5%
sentiment Porcentaje
trust 14.92%
positive 14.36%
negative 12.71%
anger 10.50%
fear 10.50%
anticipation 8.84%
sadness 8.84%
disgust 6.63%
surprise 6.63%
joy 6.08%
Porcentaje de Palabras encontradas por tipo de sentimiento ( loughran ) 7.61%
sentiment Porcentaje
positive 33.3%
litigious 26.7%
uncertainty 23.3%
negative 16.7%

[1] “Analisis de Sentimientos del Personaje: WARNER” [1] “Numero total de Palabras Unicas en el texto: 261”

Escala de Sentimientos entre negativos y positivos: afinn
Descripcion Score % Founded Words
Entre 0 (negativo) y 10 (positivo) 5.962963 8.43%
Porcentaje de Palabras encontradas por tipo de sentimiento ( bing ) 8.05%
sentiment Porcentaje
positive 78.3%
negative 21.7%
Porcentaje de Palabras encontradas por tipo de sentimiento ( nrc ) 13.4%
sentiment Porcentaje
positive 22.2%
anticipation 20.2%
trust 17.2%
joy 12.1%
surprise 9.1%
negative 7.1%
fear 5.1%
sadness 4.0%
disgust 2.0%
anger 1.0%
Porcentaje de Palabras encontradas por tipo de sentimiento ( loughran ) 3.45%
sentiment Porcentaje
negative 40.0%
litigious 26.7%
positive 20.0%
uncertainty 13.3%

[1] “Analisis de Sentimientos del Personaje: SARAH” [1] “Numero total de Palabras Unicas en el texto: 206”

Escala de Sentimientos entre negativos y positivos: afinn
Descripcion Score % Founded Words
Entre 0 (negativo) y 10 (positivo) 4.636364 9.22%
Porcentaje de Palabras encontradas por tipo de sentimiento ( bing ) 9.22%
sentiment Porcentaje
positive 52.38%
negative 47.62%
Porcentaje de Palabras encontradas por tipo de sentimiento ( nrc ) 15.5%
sentiment Porcentaje
negative 21.1%
positive 17.1%
anger 11.8%
fear 11.8%
sadness 9.2%
anticipation 7.9%
trust 7.9%
joy 6.6%
disgust 5.3%
surprise 1.3%
Porcentaje de Palabras encontradas por tipo de sentimiento ( loughran ) 5.83%
sentiment Porcentaje
negative 28.6%
uncertainty 28.6%
litigious 23.8%
positive 14.3%
constraining 4.8%

[1] “Analisis de Sentimientos del Personaje: SERENA” [1] “Numero total de Palabras Unicas en el texto: 198”

Escala de Sentimientos entre negativos y positivos: afinn
Descripcion Score % Founded Words
Entre 0 (negativo) y 10 (positivo) 4.466667 7.07%
Porcentaje de Palabras encontradas por tipo de sentimiento ( bing ) 4.55%
sentiment Porcentaje
negative 55.6%
positive 44.4%
Porcentaje de Palabras encontradas por tipo de sentimiento ( nrc ) 13.6%
sentiment Porcentaje
positive 20.8%
anticipation 18.1%
trust 16.7%
negative 13.9%
joy 8.3%
fear 6.9%
sadness 5.6%
anger 4.2%
disgust 4.2%
surprise 1.4%
Porcentaje de Palabras encontradas por tipo de sentimiento ( loughran ) 5.05%
sentiment Porcentaje
negative 45.5%
uncertainty 36.4%
litigious 9.1%
positive 9.1%

[1] “Analisis de Sentimientos del Personaje: MARGOT” [1] “Numero total de Palabras Unicas en el texto: 197”

Escala de Sentimientos entre negativos y positivos: afinn
Descripcion Score % Founded Words
Entre 0 (negativo) y 10 (positivo) 6.142857 11.7%
Porcentaje de Palabras encontradas por tipo de sentimiento ( bing ) 8.63%
sentiment Porcentaje
positive 77.3%
negative 22.7%
Porcentaje de Palabras encontradas por tipo de sentimiento ( nrc ) 10.7%
sentiment Porcentaje
positive 28.6%
joy 16.3%
trust 12.2%
surprise 10.2%
fear 8.2%
disgust 6.1%
negative 6.1%
anger 4.1%
anticipation 4.1%
sadness 4.1%
Porcentaje de Palabras encontradas por tipo de sentimiento ( loughran ) 3.55%
sentiment Porcentaje
positive 37.5%
litigious 25.0%
negative 25.0%
uncertainty 12.5%

[1] “Analisis de Sentimientos del Personaje: BROOKE” [1] “Numero total de Palabras Unicas en el texto: 194”

Escala de Sentimientos entre negativos y positivos: afinn
Descripcion Score % Founded Words
Entre 0 (negativo) y 10 (positivo) 4.612903 13.9%
Porcentaje de Palabras encontradas por tipo de sentimiento ( bing ) 14.9%
sentiment Porcentaje
positive 58.1%
negative 41.9%
Porcentaje de Palabras encontradas por tipo de sentimiento ( nrc ) 15.5%
sentiment Porcentaje
negative 20.6%
sadness 12.7%
anger 9.8%
fear 9.8%
positive 9.8%
trust 9.8%
joy 7.8%
disgust 6.9%
surprise 6.9%
anticipation 5.9%
Porcentaje de Palabras encontradas por tipo de sentimiento ( loughran ) 7.73%
sentiment Porcentaje
negative 52.9%
positive 29.4%
litigious 11.8%
uncertainty 5.9%

[1] “Analisis de Sentimientos del Personaje: PAULETTE” [1] “Numero total de Palabras Unicas en el texto: 225”

Escala de Sentimientos entre negativos y positivos: afinn
Descripcion Score % Founded Words
Entre 0 (negativo) y 10 (positivo) 5.257143 12.4%
Porcentaje de Palabras encontradas por tipo de sentimiento ( bing ) 9.78%
sentiment Porcentaje
positive 59.3%
negative 40.7%
Porcentaje de Palabras encontradas por tipo de sentimiento ( nrc ) 13.8%
sentiment Porcentaje
positive 18.7%
joy 15.0%
anticipation 12.1%
negative 12.1%
trust 11.2%
fear 7.5%
disgust 6.5%
sadness 6.5%
surprise 6.5%
anger 3.7%
Porcentaje de Palabras encontradas por tipo de sentimiento ( loughran ) 4.44%
sentiment Porcentaje
positive 33.3%
uncertainty 33.3%
negative 25.0%
litigious 8.3%

[1] “Analisis de Sentimientos del Personaje: PROFESSOR STROMWELL” [1] “Numero total de Palabras Unicas en el texto: 176”

Escala de Sentimientos entre negativos y positivos: afinn
Descripcion Score % Founded Words
Entre 0 (negativo) y 10 (positivo) 5.2 10.8%
Porcentaje de Palabras encontradas por tipo de sentimiento ( bing ) 11.4%
sentiment Porcentaje
positive 59.1%
negative 40.9%
Porcentaje de Palabras encontradas por tipo de sentimiento ( nrc ) 18.8%
sentiment Porcentaje
negative 20.8%
positive 18.2%
trust 15.6%
anticipation 10.4%
anger 7.8%
fear 7.8%
joy 6.5%
sadness 6.5%
surprise 3.9%
disgust 2.6%
Porcentaje de Palabras encontradas por tipo de sentimiento ( loughran ) 9.66%
sentiment Porcentaje
litigious 35%
negative 20%
uncertainty 20%
positive 15%
constraining 10%

[1] “Analisis de Sentimientos del Personaje: PROFESSOR DONOVAN” [1] “Numero total de Palabras Unicas en el texto: 220”

Escala de Sentimientos entre negativos y positivos: afinn
Descripcion Score % Founded Words
Entre 0 (negativo) y 10 (positivo) 4.826087 9.09%
Porcentaje de Palabras encontradas por tipo de sentimiento ( bing ) 6.82%
sentiment Porcentaje
positive 55%
negative 45%
Porcentaje de Palabras encontradas por tipo de sentimiento ( nrc ) 19.5%
sentiment Porcentaje
positive 20%
negative 17%
trust 16%
anticipation 10%
fear 10%
anger 9%
sadness 8%
disgust 4%
joy 3%
surprise 3%
Porcentaje de Palabras encontradas por tipo de sentimiento ( loughran ) 7.27%
sentiment Porcentaje
negative 40%
litigious 30%
uncertainty 20%
constraining 5%
positive 5%

[1] “Analisis de Sentimientos del Personaje: CHUTNEY” [1] “Numero total de Palabras Unicas en el texto: 92”

Escala de Sentimientos entre negativos y positivos: afinn
Descripcion Score % Founded Words
Entre 0 (negativo) y 10 (positivo) 4.466667 8.7%
Porcentaje de Palabras encontradas por tipo de sentimiento ( bing ) 3.26%
sentiment Porcentaje
negative 75%
positive 25%
Porcentaje de Palabras encontradas por tipo de sentimiento ( nrc ) 8.7%
sentiment Porcentaje
positive 22.2%
fear 18.5%
anger 14.8%
negative 14.8%
trust 11.1%
sadness 7.4%
surprise 7.4%
disgust 3.7%

Table: Porcentaje de Palabras encontradas por tipo de sentimiento ( loughran ) 0%

sentiment Porcentaje ———- ————

[1] “Analisis de Sentimientos del Personaje: DORKY DAVID” [1] “Numero total de Palabras Unicas en el texto: 76”

Escala de Sentimientos entre negativos y positivos: afinn
Descripcion Score % Founded Words
Entre 0 (negativo) y 10 (positivo) 4.333333 3.95%
Porcentaje de Palabras encontradas por tipo de sentimiento ( bing ) 5.26%
sentiment Porcentaje
negative 50%
positive 50%
Porcentaje de Palabras encontradas por tipo de sentimiento ( nrc ) 13.2%
sentiment Porcentaje
fear 18.2%
positive 18.2%
trust 18.2%
anger 9.1%
anticipation 9.1%
negative 9.1%
sadness 9.1%
disgust 4.5%
surprise 4.5%
Porcentaje de Palabras encontradas por tipo de sentimiento ( loughran ) 9.21%
sentiment Porcentaje
litigious 42.9%
uncertainty 42.9%
negative 14.3%

[1] “Analisis de Sentimientos del Personaje: ENID” [1] “Numero total de Palabras Unicas en el texto: 143”

Escala de Sentimientos entre negativos y positivos: afinn
Descripcion Score % Founded Words
Entre 0 (negativo) y 10 (positivo) 4.714286 4.9%
Porcentaje de Palabras encontradas por tipo de sentimiento ( bing ) 8.39%
sentiment Porcentaje
negative 50%
positive 50%
Porcentaje de Palabras encontradas por tipo de sentimiento ( nrc ) 11.9%
sentiment Porcentaje
positive 19.4%
negative 16.7%
anger 13.9%
trust 13.9%
anticipation 11.1%
fear 11.1%
disgust 5.6%
sadness 5.6%
joy 2.8%
Porcentaje de Palabras encontradas por tipo de sentimiento ( loughran ) 4.2%
sentiment Porcentaje
negative 28.6%
uncertainty 28.6%
constraining 14.3%
litigious 14.3%
positive 14.3%

[1] “Analisis de Sentimientos del Personaje: ENRIQUE” [1] “Numero total de Palabras Unicas en el texto: 79”

Escala de Sentimientos entre negativos y positivos: afinn
Descripcion Score % Founded Words
Entre 0 (negativo) y 10 (positivo) 5.3125 13.9%
Porcentaje de Palabras encontradas por tipo de sentimiento ( bing ) 10.1%
sentiment Porcentaje
negative 50%
positive 50%
Porcentaje de Palabras encontradas por tipo de sentimiento ( nrc ) 8.86%
sentiment Porcentaje
positive 22.7%
joy 18.2%
trust 13.6%
negative 9.1%
sadness 9.1%
surprise 9.1%
anger 4.5%
anticipation 4.5%
disgust 4.5%
fear 4.5%
Porcentaje de Palabras encontradas por tipo de sentimiento ( loughran ) 3.8%
sentiment Porcentaje
negative 66.7%
uncertainty 33.3%

[1] “Analisis de Sentimientos del Personaje: JUDGE” [1] “Numero total de Palabras Unicas en el texto: 67”

Escala de Sentimientos entre negativos y positivos: afinn
Descripcion Score % Founded Words
Entre 0 (negativo) y 10 (positivo) 4 5.97%
Porcentaje de Palabras encontradas por tipo de sentimiento ( bing ) 4.48%
sentiment Porcentaje
positive 66.7%
negative 33.3%
Porcentaje de Palabras encontradas por tipo de sentimiento ( nrc ) 10.4%
sentiment Porcentaje
trust 35.7%
positive 28.6%
anger 7.1%
anticipation 7.1%
fear 7.1%
negative 7.1%
sadness 7.1%
Porcentaje de Palabras encontradas por tipo de sentimiento ( loughran ) 8.96%
sentiment Porcentaje
litigious 55.6%
negative 33.3%
uncertainty 11.1%

[1] “Analisis de Sentimientos del Personaje: MRS. WINDHAM VANDERMARK” [1] “Numero total de Palabras Unicas en el texto: 117”

Escala de Sentimientos entre negativos y positivos: afinn
Descripcion Score % Founded Words
Entre 0 (negativo) y 10 (positivo) 7 0.855%
Porcentaje de Palabras encontradas por tipo de sentimiento ( bing ) 3.42%
sentiment Porcentaje
positive 83.3%
negative 16.7%
Porcentaje de Palabras encontradas por tipo de sentimiento ( nrc ) 12.8%
sentiment Porcentaje
positive 22.6%
negative 16.1%
anticipation 9.7%
joy 9.7%
surprise 9.7%
trust 9.7%
anger 6.5%
disgust 6.5%
fear 6.5%
sadness 3.2%
Porcentaje de Palabras encontradas por tipo de sentimiento ( loughran ) 0.855%
sentiment Porcentaje
uncertainty 100%

Score por Personaje en el tiempo

Top 10 Personajes

Dialogos cúspide por Top 10 Personajes: Legally%20Blonde
Personaje Min_Max Dialogo
ELLE MIN She could use some mascara and some serious highlights, but she’s not completely unfortunate-looking.
ELLE MAX Is everything okay?
EMMETT MIN I don’t – Do that stuff.
EMMETT MAX Good luck.
DONOVAN MIN What are you talking about?
DONOVAN MAX You’re a beautiful girl, Elle.
WARNER MIN You got into Harvard Law?
WARNER MAX Come on, we can make room for one more.
SARAH MIN Have you ever noticed that Donovan never asks Warner to bring him coffee? He’s asked me at least a dozen times.
SARAH MAX This should be amusing.
SERENA MIN There he is!
SERENA MAX What’s the thing that always makes us feel better, no matter what?
MARGOT MIN Why else would he be taking you to The Ivy? You’ve been dating for a year – it’s not like he’s trying to impress you.
MARGOT MAX Jesus. Talk about a Rock. You must be better in bed than you look.
PAULETTE MIN I’m taking the dog dumbass. C’mere, baby, Mommy’s here!
PAULETTE MAX Elle, you’ve changed my life. You are the kindest, most wonderful angel. Without you, I wouldn’t have Rufus or a dinner date. Now go and share your goodness with the world while I stay here and have my hoo-hoo waxed.
BROOKE MIN Hey – I know you.
BROOKE MAX But is he an ass that’s gonna win my case?
PROFESSOR STROMWELL MIN If you let one stupid prick ruin your life, you’re not the girl I thought you were.
PROFESSOR STROMWELL MAX “An image and a good hook can get you into a room – but something has to keep you in that room.”

Top 4 Parejas

Dialogos cúspide por Top 4 Parejas: Legally%20Blonde
Parejas Min_Max Dialogo
ELLE_EMMETT MIN You’re serious?
ELLE_EMMETT MAX Thanks for the backup.
ELLE_SARAH MIN The idiot speaks.
ELLE_SARAH MAX Maybe you should sleep with the judge too. Then we can win the case.
ELLE_WARNER MIN I just don’t want to see you get your hopes up. You know how you get.
ELLE_WARNER MAX How was I supposed to know what kind of shoes you had on?
ELLE_SERENA MIN Bring her, too. C’mon. You can wear one of Elle’s outfits.
ELLE_SERENA MAX What’s the thing that always makes us feel better, no matter what?

Reglas de Asociación entre palabras (Market Basket)

Toda la pelicula

## [1] "Lift Promedio de las Reglas de Asociacion: 22.3450974069944"
## [1] "Desviación estandar del Lift de las Reglas de Asociacion: 15.4146554119165"
## [1] "Deciles del Lift : "
##        10%        20%        30%        40%        50%        60% 
##   3.254967   7.503817  11.702381  14.455882  20.914894  25.868421 
##        70%        80%        90%       100% 
##  32.766667  37.807692  40.958333 196.600000

Datos del Histograma: Lift Pelicula: LEGALLY BLONDE
Numero de Dialogos Lift Minimo Lift Maximo
24,514 -3 3
31,078 3 10
44,326 10 17
32,304 17 24
22,552 24 30
26,030 30 37
## [1] "Leverage Promedio de las Reglas de Asociacion: 0.00797707699313047"
## [1] "Desviación estandar del Leverage de las Reglas de Asociacion: 0.0063592855856971"
## [1] "Deciles del Leverage : "
##         10%         20%         30%         40%         50%         60% 
## 0.004288572 0.004729434 0.004843272 0.004946760 0.004993330 0.005917484 
##         70%         80%         90%        100% 
## 0.007923095 0.011580386 0.014849595 0.128001043

Datos del Histograma: Leverage pelicula: LEGALLY BLONDE
Numero de Dialogos Leverage Minimo Leverage Maximo
10,368 -0.0022 0.0022
133,500 0.0022 0.0066
39,562 0.0066 0.011
26,028 0.011 0.015
7,912 0.015 0.02
5,684 0.02 0.024

Top 10 Personajes

Top 4 Parejas

Analisis de Relaciones entre Personajes (Pagerank)

Pagerank: Legally Blonde.

Pagerank: Legally Blonde.